Multi Modal as Learning to Rank
Colbertv2 instead of CLIP
multiple modalities retrieval
masking
incorporate prompting
focal loss instead of siglip sigmoid
Colbertv2 instead of CLIP
multiple modalities retrieval
masking
incorporate prompting
focal loss instead of siglip sigmoid